Reducing duplication of files on a network

ABSTRACT

Systems and techniques for improving the performance of a network system having one or more sending systems and one or more receiving systems may include determining the digital signature of a received digital file, comparing the digital signature against stored digital signatures of digital files accessible to the receiving system, and determining whether to store the received digital file and/or a location identifier for the stored version of the received digital file based on a result of the comparison.

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority to U.S. Provisional ApplicationNo. 60/334,578, filed Dec. 3,2001, and entitled “REDUCING DUPLICATION OFFILES ON A NETWORK.”

TECHNICAL FIELD

[0002] The concepts and implementations relate generally to the storageof files in network systems.

BACKGROUND

[0003] Network systems enable communication of messages among computersystems. For example, network systems enable communication of files overthe Internet. Increases in computer and Internet usage have resulted inan increased number of files exchanged, causing network resources tobecome increasingly taxed and difficult to operate and maintain. Tocomplicate matters, attachments may be included with files beingexchanged over a network, leading to the dedication of additionalnetwork resources to the communication and storage of particular files.In fact, a popular file (e.g., electronic mail message) and itsattachment may be sent numerous times from a single source or fromsubsequent recipients of the file and its attachment.

SUMMARY

[0004] In one general aspect, a digital signature for a received filemay be determined and that signature may be compared with stored digitalsignatures of digital files accessible to a network system to determinewhether or not to store that received file.

[0005] Implementations may include one or more of the followingfeatures. For example, the digital signature for the received fileand/or a location identifier for the file may be stored with the storeddigital signatures when the digital signature does not correspond to astored digital signature. The location identifier may be generated whenthe comparison reveals that the digital signature of the digital filedoes not correspond to any of the stored digital signatures.Implementations may include storing the location identifier when thefile is received a number of times corresponding to a storage threshold.Implementations also may include replacing the received file with alocation identifier when the digital signature corresponds to at leastone of the stored digital signatures.

[0006] Determining the digital signature may include applying a hashingtechnique to all or less than all of a received file. Applying thehashing technique may include applying a proprietary algorithm, the MD5(“Message Digest 5”) algorithm and/or the SHA (“Secure Host Algorithm”)algorithm. Determining the digital signature also may include using oneor more portions or parameters of the received file, and/or using thename and/or size of the file to determine the digital signature.

[0007] The content of the received file and the stored file may beverified, for example, by using all or part of the file name, the hashof the file, the size of the file, content in all or part of the file,or other means.

[0008] A counter may be used to monitor file usage and/or redundancy.For instance, a counter may be set to an initial value when the digitalsignature is added to the stored digital signatures. The counter may beincremented when the digital signature of a received file corresponds tothe stored digital signature. By contrast, the counter may bedecremented to effectively delete or to represent deletion of aninstance of the digital file. The stored digital file, the storeddigital signature, and/or the location identifier may be deleted whenthe counter falls below a file deletion threshold, a signature deletionthreshold and a location identifier deletion threshold, respectively.

[0009] The digital file may include an electronic mail message and/orone or more attachments. The digital signature may include the digitalsignature of an attachment. Comparing digital signatures may includecomparing digital signatures for attachments.

[0010] Determining whether to store the digital file may includedetermining whether the digital file has been replaced with a locationidentifier a number of times per stored instance that equals or exceedsa high volume threshold. When the digital file has not been replaced anumber of times per stored instance greater than or equal to the highvolume threshold, the location identifier for the previously-storedinstance may be retrieved. When the digital file has been replaced anumber of times equal to or greater than the high volume threshold, thedigital file may be stored. This may include storing a locationidentifier for the stored digital file.

[0011] A received file may be separated into its constituent componentsusing an apparatus with one or more electronic mailboxes. The electronicmailboxes may include one or more location identifiers useful inidentifying content portions of electronic mail messages and/orattachments to those messages.

[0012] These and other aspects may be implemented by an apparatus and/orby a computer program stored on a computer readable medium such as adisc, a client device, a host device and/or a propagated signal. Theapparatus that determines digital signatures may include a devicephysically distinct from other devices that receive the digital file.The apparatus may also forward digital signatures and/or have a localdata store of signatures.

[0013] As such, details of one or more implementations are set forth inthe accompanying drawings and the description below. Other features willbe apparent from the description and drawings, and from the claims.

DESCRIPTION OF DRAWINGS

[0014]FIG. 1 is a block diagram illustrating an exemplary network systemcapable of reducing duplication of files on a network.

[0015]FIG. 2 is a block diagram illustrating an exemplary digital filewhich may be included in, constitute, or contain a file exchanged in anetwork system, such as that illustrated by FIG. 1.

[0016]FIG. 3 is a flow chart illustrating an exemplary process forreceiving a file using a network system, such as that illustrated byFIG. 1.

[0017]FIG. 4 is a flow chart illustrating an exemplary process forreceiving and processing a file using a network system such as thatillustrated by FIG. 1, when the digital file has been received beyond astorage threshold number of times.

[0018] Like reference symbols in the various drawings may indicate likeelements.

DETAILED DESCRIPTION

[0019] For illustrative purposes FIGS. 1-4 illustrate a network systemand techniques implemented by that system for receiving electronic filesand reducing their duplication. Referring to FIG. 1, a network system100 is structured and arranged to enable the exchange of files between asending system 110 and a receiving system 130 through a network 120. Forbrevity, several elements in these figures are represented as monolithicentities. However, as would be understood by one skilled in the art,implementations of these elements may include numerous interconnectedcomputers and components that are designed to perform a set of specifiedoperations and/or that are dedicated to a particular geographicalregion. Furthermore, one or more of the elements illustrated by FIG. 1may be operated jointly or independently by one or more organizations.

[0020] Each of the sending system 110 and the receiving system 130 maybe implemented by, for example, a general-purpose computer capable ofresponding to and executing instructions in a defined manner, a personalcomputer, a special-purpose computer, a workstation, a server, a device,a component, other equipment or some combination thereof capable ofresponding to and executing instructions. The sending system 110 may bestructured and arranged to receive instructions from, for example, asoftware application, a program, a piece of code, a device, a computer,a computer system, or a combination thereof, which independently orcollectively direct operations, as described herein. The instructionsmay be embodied permanently or temporarily in any type of machine,component, equipment, storage medium, or propagated signal that iscapable of being delivered to the sending system 110 or the receivingsystem 130.

[0021] The sending system 110 may include a communication interface (notshown), such as, for example, an electronic mail gateway. For instance,the sending system 110 may include a dedicated mailing system that isimplemented by specialized hardware or executed by a general purposeprocessor capable of running various applications, such as electronicmailer programs, and capable of employing various file transferprotocols, such as the SMTP (“Simple Mail Transfer Protocol”). Thecommunications interface of sending system 110 enables communicationsbetween the sending system 110 and other systems through, for example,network 120.

[0022] The network 120 typically is structured and arranged to enabledirect or indirect communications between the sending system 110 and thereceiving system 130. Examples of the network 120 include the Internet,the World Wide Web, WANs (Wide Area Networks), LANs (Local AreaNetworks), analog or digital wired and wireless telephone networks (e.g.PSTN (“Public Switched Telephone Network”), ISDN (“Integrated ServicesDigital Network”), or xDSL (“Digital Subscriber Loop”)), radio,television, cable, satellite, and/or any other delivery mechanism forcarrying data. The network 120 may include a direct link between thesending system 110 and the receiving system 130, or the network 120 mayinclude one or more networks or subnetworks between them. Each networkor subnetwork may include, for example, a wired or wireless data pathwaycapable of carrying and receiving data.

[0023] The receiving system 130 may be structured and arranged to formpart of or include an information delivery system, such as, for example,electronic mail, the World Wide Web, an online service provider, and/orother analog or digital wired and/or wireless systems that enablecommunication or delivery of information.

[0024] As shown in FIG. 1, in one exemplary implementation, thereceiving system 130 may include an intermediate system 132 and a useraccessible system 134.

[0025] The intermediate system 132 may be structured and arranged toreceive files from one or more sending systems 110 and to distributereceived files to the user accessible system 134. These files mayinclude, for example, electronic mail, attachments to electronic mail,or other files, as described below. The intermediate system 132 mayinclude one or more SMTP relays 132 a, file segmentors 132 b, and/ordata stores 132 c.

[0026] The SMTP relays 132 a may be structured and arranged to initiallyreceive incoming files (e.g., electronic mail). They generally areconfigured to capture received SMTP traffic from a sending system 110 toavoid refusal of connections requested by a sending system 110. The SMTPrelay 132 a may include one or more general purpose computing devicesrunning SMTP-receiving applications or they may be implemented tovarying degrees in specialized hardware implementations that aredesigned to receive files. The SMTP relays 132 a also may be implementedusing one or more applications residing on a device consolidating one ormore file receiving functions. In the implementation shown by FIG. 1,the SMTP relays 132 a are structured and arranged to communicate withone or more file segmentors 132 b.

[0027] The file segmentors 132 b may be structured and arranged tosegment a digital file into its constituent parts including, forexample, header information, content and attachments.

[0028]FIG. 2 illustrates a digital file 200 that includes headerinformation 210, content 220, and attachments 230, although the digitalfile 200 may include only one of these components or some combination orsubset of these components. The digital file 200 of FIG. 2 may representan electronic mail message received from the sending system 110, wherethe sending system is configured to transmit electronic mail messages.In some implementations, the header information 210 may includeidentification information for the sender and/or the intended recipient.The content 220 may include a message having, for example, textformatted in plain text or other of various formats including RTF (“RichText Format”) or other public and proprietary formatting techniques. Theattachments 230 may include electronic documentation, holiday greetings,or other files formatted as text, images, video, audio, or otherwise.

[0029] The file segmentor 132 b may be structured and arranged toseparate portions of the digital file received by receiver 130 (e.g.,through SMTP relays 132 a) into constituent parts and to associate thoseconstituent parts with an identifier and/or an electronic mailboxassociated with an identifier related to the digital files. Theidentifier may include a screen name, a user identification, an IPaddress or other information. In some implementations, the identifiermay include authentication information, information associated with theonline identification including mailbox parameters such as mailbox size,address book information, or status of mail sent or received. Theidentifier also may include other information, such as locationidentifiers (e.g. pointers, arrays, records) that identify other partsof the digital file. The identifier may be used to enable access to theinformation, content, and attachments associated with a particularidentity, e.g., a sender. For instance, a user may access pointers forvarious digital files (e.g., electronic mail messages) based on apersonal identifier, which may be known by the user or transparent tothe user.

[0030] The file segmentor 132 b may separate particular or predesignatedcontent of a received digital file from other sections of that digitalfile. As described with respect to FIG. 2, the content may be in any ofvarious forms, such as a text message, a letter, or other information.For example, the content portion may include a letter instructing therecipient to “please see attached.”

[0031] The file segmentor 132 b may separate one or more attachmentsfrom the digital file received. In some implementations, this mayinclude removing electronic documentation from an electronic mailmessage. For instance, an attachment with holiday greetings may beseparated from a received electronic mail message and dynamically linkedto that electronic mail message by a pointer.

[0032] The data store 132 c may be structured and arranged to enablesearches of the digital files or portions of digital files separated byfile segmentor 132 b against other stored digital files stored by orcapable of communicating with, and thus accessible to, the receivingsystem 130.

[0033] The data store 132 c may be implemented by one or more generalpurpose computers running an operating system and an application. Forexample, the data store 132 c may be implemented as a group of serversrunning a general purpose operating system and several applications thatsearch accessed or maintained digital signatures that correspond tostored digital files accessible to the receiving system. Implementationsmay include having the data store 132 c operate on a special purposedevice running a reduced operating system. For example, the data store132 c may include hardware designed to support large arrays ofsignatures and to return results of a search of those signatures.

[0034] In some implementations, the data store 132 c may be structuredand arranged to be able to determine a digital signature for a digitalfile or some portion of an electronic mail message separated by fileseparator 132 b. However, in other implementations, this functionalitymay be implemented through a separate program or process residing on aseparate server that includes or communicates with the file segmenter132 b.

[0035] The data store 132 c may include processing capabilities thatenable a comparison of the digital signature with stored digitalsignatures of digital files accessible to the receiving system. The datastore 132 c may reside as a separate process or program running on ageneral-purpose device. Alternatively, the data store 132 c may be aspecialized hardware device. Other implementations may feature thiscapability to compare the digital signature with the digital signaturesof stored digital files residing on a shared device that performslimited functions. The device may have regional awareness of some storeddigital signatures for files received by one or several devices. Otherimplementations may feature a data store 132 c with global awareness ofall stored digital signatures. Some implementations of the data store132 c may offer global awareness of stored digital signatures residingin several systems, and also may be structured and arranged to implementa local awareness in individual systems in the event of an outage.

[0036] The digital signatures of stored digital files accessible to thereceiving system 130 may be stored as an array of values, an index, adynamic list or other information stored locally at data store 132 c,remotely in a single device, or distributed across several devices. Thedigital signatures may be sorted or organized for faster comparisons.The user-accessible system 134 generally is structured and arranged toenable access to files that have been sent to the receiving system 130or that are otherwise accessible to that system. In the implementationshown in FIG. 1, the user accessible system 134 generally includesdevices that store a digital file in its constituent parts. Forinstance, the user accessible system 134 may include a storage device134 a for electronic mailbox information (e.g., header information), astorage device 134 b for content information, and a storage device 134 cfor attachments.

[0037] In this manner, the receiving system 130 may be structured andarranged to reduce duplication of electronic files received. Forexample, if the intermediate system 132 determines that there arenumerous instances of a file through a comparison of digital signatureor otherwise, a location identifier (e.g., a pointer, address,reference, or link) may be stored for one or more of the instances ofthe file rather than maintaining each copy of the file. For instance, anOSP (“online storage provider”) may eliminate or replace duplicateattachments to received email by storing a pointer to other instances ofthe same attachment. More generally, subsequently-received digital fileshaving the same attachment may be stored with a location identifier thatpoints to an instance of the attachment previously received and/orstored, rather than repeatedly storing the same attachment.

[0038]FIG. 3 illustrates one implementation of a process for reducingduplication of digital files. For convenience, the process shown in FIG.3 references particular componentry described with respect to FIG. 1.However, similar methodologies may be applied in other implementationswhere different componentry is used to define the structure of thesystem, or where the functionality is distributed differently among thecomponents shown by FIG. 1.

[0039] Initially, a digital file is received by, for example, receivingsystem 130 from sending system 110 (step 305). In one implementation,the digital file received includes an electronic mail message and/or anattachment to that message.

[0040] A digital signature may be computed for the received file usingvarious techniques (step 310). Generally, a digital signature is aunique “profile” or “finger print” of a digital file that identifies thedigital file. The digital signature for a file may be computed, forexample, by applying a hashing technique to all or less than all of thefile. The output of the hashing technique is referred to as a hashvalue. Typically, the hash value is substantially smaller than therequested digital file, and is generated from an algorithm in such a waythat it is extremely unlikely that different data files will produce thesame hash value. Examples of hashing techniques include, but are notlimited to, the MD5 (“Message Digest 5”) family of algorithms and/or theSHA (“Secure Hash Algorithm”) family of algorithms.

[0041] The digital signature for a data file may be computed at thereceiving system, for example, at data store 132 c of receiving system130, or it may be computed at the sending system before a data file iscommunicated. In the later implementation, the digital signatureoptionally may be encrypted. For instance, in one implementation, thesending system 110 determines a digital signature for a digital file tobe transmitted by applying a hashing technique to that digital file.Then, the digital file and the obtained hash value are encrypted andsent to the receiving system 130. Upon receiving the encrypted data fromthe sending system 110, the receiving system 130 decrypts the data fileand the hash value using an appropriate key. To verify the integrity ofthe data file, the receiving system 130 may perform the same hashingtechnique applied by the sending system 110 and may compare theresulting hash value to the decrypted hash value. If the hash values arethe same, the integrity of the data is presumed to have been preservedacross the network 120, and the hash value is used as a digitalsignature for the file.

[0042] Whether the digital signature is computed by the recipient and/orsender, the generation of the digital signature may be based on variousinformation related to the digital file. For example, a name could beused in conjunction with the file size and a hash value. Otherimplementations may use a portion of those or different parameters.

[0043] In another example, a received file may be separated into one ormore constituent parts such that the digital signature is determined onone or more of the constituent parts. For instance, a digital file maybe separated into header information 210, content 220, and attachments230. The digital signature may be computed for one or more of thecomponent parts, e.g., the attachments 230.

[0044] Once the digital signature is computed for a digital file (step310), the digital signature is compared with other digital signatures,for example, digital signatures of stored digital files accessible tothe receiving system 130 (step 320). In some implementations, comparingthe digital signature with the stored digital signatures involves acomparison of digital signatures for less than all aspects of a digitalfile. For example, the receiving system 130 may only attempt to checkfor duplicate attachments that are received as part of a digital file.In other implementations, the digital signatures for entire files may becompared.

[0045] If the computed digital signature is not among the stored digitalsignatures (step 320), the digital signature may be added to the storeddigital signatures (step 325), along with a location identifier (step345) for the received digital file which is itself stored (step 355).

[0046] A counter may be used to indicate whether to delete instances ofthe digital file when the counter drops to or below a file deletionthreshold. For example, if several users have deleted their user copy ofthe deleted file (e.g., by deleting mail files in a mailbox), thecounter may be decremented. In one implementation, when the counterreaches a signature deletion threshold, the digital signature may beremoved from the stored digital signatures. In another implementation,when the counter drops below a location identifier deletion threshold,the location identifier may no longer be replaced for digital files. Forexample, the location identifier may be removed from the stored digitalsignatures.

[0047] Typically, if the digital signature is found among the storeddigital signatures (step 320), the received digital file is replacedwith a location identifier or pointer to the stored instance of the fileto avoid duplication while enabling future access to the receiveddigital file. For example, as described with respect to steps 330-360,if a digital signature corresponding to a digital signature for areceived email attachment is found in the stored digital signatures, alocation identifier corresponding to the stored digital signature may beaccessed and stored as a pointer to a previously-stored instance of theemail attachment rather than storing the received attachmentredundantly.

[0048] More specifically, if the digital signature is found among thestored digital signatures accessible to the receiving system, thereceiving system 130 verifies that the received digital file correspondsto the stored file to ensure that the files are the same prior toreplacing the digital file with a location identifier (step 330).Examples of verifying content include, but are not limited to, examiningand/or comparing attributes of the content such as its name or size,and/or data associated with the retrieved file.

[0049] In one implementation, once the content is verified (step 330),the received digital file is replaced with a location identifier thatpoints to or otherwise identifies the previously-stored instance of theduplicative received digital file, thereby avoiding redundant storage ofthe same file.

[0050] In more sophisticated implementations, a counter may be used toindicate the number of times a digital file has been received and tolimit replacement of duplicative digital files based on this number (seesteps 340-350). Specifically, when a digital signature is added to thestored digital signatures, a counter associated with the digitalsignature may be set to an initial value. Each time the digitalsignature is found in the stored digital signatures, the counter isincremented (step 340). Generally, the receiving system 130 replaces thedigital file with a location identifier after the signature is found inreceived files a storage threshold number of times (steps 350-360). Inaddition, the location identifier generally is stored when the counteris below the storage threshold (steps 350 and 345). For example, whenthe counter reaches the storage threshold (step 350), the digital filemay be replaced with a location identifier to avoid duplication (step360). However, before the storage threshold is reached (step 350), thelocation identifier may be stored (step 345) along with the digital file(step 355). That is, as will be described with respect to FIG. 4, todistribute load and/or provide some measure of redundancy, a receivingsystem 130 may continue to store a digital file after the digitalsignature is found in the stored digital signatures accessible to thereceiving system (step 355).

[0051]FIG. 4 shows a procedure 400 that may be used to reduceduplication of files and that includes storing more than one copy of afile. In some cases, it is advantageous to store more than one copy of afile on the receiving system 130. For example, several copies may bestored according to a predetermined ratio to implement load balancing.At high frequencies, the receiving system 130 may store additionalinstances of the file to handle the volume of requests even after thereceiving system 130 has begun to replace the digital file with alocation identifier. When the frequency diminishes, instances of thefile may be removed.

[0052] Procedure 400 generally is used after a receiving system 130 hasdetermined that the digital signature of a received digital file isamong the stored digital signatures (step 320). As such, procedure 400may be implemented in lieu of or in addition to steps 330, 340 and 350to determine whether to store another instance of the received digitalfile or replace the received digital file with a location identifier fora previously-stored instance of that file.

[0053] Initially, as part of determining that the digital signature forthe received file is found in the stored digital signatures, the counteris incremented (step 410).

[0054] The counter is checked to see if the digital file has beenreplaced a high volume threshold number of times with a locationidentifier per stored instance (step 410). If this is the case, thereceiving system 130 then stores the location identifier (step 345)along with the newly stored instance of the digital file (step 355).Thus, subsequently finding the digital signature in the stored digitalsignatures distributes access to the digital file across more copies. Ifnot, the receiving system 130 returns a location identifier for thepreviously-stored version of the digital file (step 440). The receivingsystem 130 replaces the digital file with the location identifier toavoid duplication (step 360).

[0055] Some implementations may manage high demand conditions by storingmultiple instances of the stored digital signatures, and/or includingmultiple receiving systems 130. In high demand conditions, the multiplestored files, multiple stored digital signatures, and/or multiplereceiving systems 130 are accessible to users (e.g., through a roundrobin assignment). For example, when multiple instances of a file arestored, the receiving system 130 may alternate assignment of locationidentifiers among the stored instances.

[0056] Other implementations may initially add a digital signature tothe stored digital signatures but will only replace the digital filewith a location identifier based on frequency, such as, when the digitalsignature is found in the stored digital signatures more than a highvolume threshold number of times during a given period of time. Forexample, the counter that keeps track of the number of times a digitalfile is received could be reset to an initial value every time the highvolume threshold is reached, which then resets the counter as anotherinstance of the file is stored. In another example, the receiving system130 may replace a file with a location identifier when the file has beenreceived or requested at least five hundred times in a one-hour period.

[0057] In some implementations, each digital file, the constituent partsof a digital file or digital signatures associated with a digital filemay include an associated time stamp. For example, when a digitalsignature is added to the stored digital signatures, a time stamp mayindicate when the digital signature was added. The time stamp may beused to keep the stored digital signatures current, and subsequentmatches to the digital signature may update the time stamp. The timestamp also may be used to remove digital signatures corresponding tofiles that are not frequently and/or recently requested.

[0058] The methods, devices and programs of the receiving system may beimplemented in hardware or software, or a combination of both. In someimplementations, the methods, devices and programs are implemented incomputer programs executing on programmable computers each with at leastone processor, a data storage system (including volatile andnon-volatile memory and/or storage elements), at least one input device,and at least one output device. Program code is applied to input data toperform the functions described herein and generate output information.The output information is applied to one or more output devices.

[0059] The methods, devices and programs of the receiving system may beimplemented as a computer program storable on a medium that can be readby a computer system, such as receiving system 130, configured toprovide the functions described herein. While the methods, devices andprograms are described as if executed on a separate processor, themethods, devices and programs may be implemented as a software processexecuted by one or more receiving systems 130.

[0060] Each program may be implemented in a high level procedural orobject oriented programming language to communicate with a computersystem. However, the programs can be implemented in assembly or machinelanguage, if desired. In any case, the language may be a compiled orinterpreted language.

[0061] Each such computer program may be stored on a storage media ordevice (e.g., ROM (“Read Only Memory”) or magnetic diskette) readable bya general or special purpose programmable computer, for configuring andoperating the computer when the storage media or device is read by thecomputer to perform the procedures described herein. The computerreadable medium can also be a propagated signal. The receiving system130 system may also be considered to be implemented as acomputer-readable storage medium, configured with a computer program,where the storage medium so configured causes a computer to operate in aspecific and predefined manner to perform the functions describedherein.

[0062] A number of implementations have been described. Nevertheless, itwill be understood that various modifications may be made. For example,although the methods, devices and programs have been described in thecontext of a wide area public network, the methods, devices and programscan be applied to any network (including private wide area and localarea networks) in which files transmitted from one node are transmittedto a receiving processor that can be programmed or configured as areceiving system.

[0063] Other implementations are within the scope of the followingclaims.

What is claimed is:
 1. A method for reducing duplication of files in anetwork system including one or more sending systems and one or morereceiving systems, the method comprising: determining a digitalsignature for a digital file received by a receiving system; comparingthe digital signature against stored digital signatures of storeddigital files accessible by the receiving system; and determiningwhether to store the digital file based on the comparison of the digitalsignature of the digital file against the stored digital signatures ofthe digital files accessible by the receiving system.
 2. The method ofclaim 1 wherein the digital file comprises an electronic mail message.3. The method of claim 2 wherein the digital file includes an attachmentin an electronic mail message.
 4. The method of claim 3 whereindetermining the digital signature for the digital file includesdetermining the digital signature of the attachment.
 5. The method ofclaim 1 further comprising storing the digital file when a result of thecomparison reveals that the digital signature of the digital file doesnot correspond to any of the stored digital signatures accessible by thereceiving system.
 6. The method of claim 6 wherein the digital signaturefor the digital file and the stored digital signatures are compared bycomparing the digital signature for the attachment with the storeddigital signatures corresponding to attachments of the digital filesaccessible by the receiving system.
 7. The method of claim 6 furthercomprising storing the digital signature when the result of thecomparison reveals that the digital signature of the digital file doesnot correspond to any of the stored digital signatures.
 8. The method ofclaim 6 further comprising generating a location identifier for thedigital file indicating a location of the digital file when the resultof the comparison indicates that the digital signature of the digitalfile does not correspond to any of the stored digital signatures.
 9. Themethod of claim 8 further comprising storing the location identifier ifthe file has been received more than a storage threshold number oftimes.
 10. The method of claim 1 further comprising storing a locationidentifier for a previously-stored digital file corresponding to one ofthe stored digital signatures when the result of the comparison revealsthat the one stored digital signature matches the digital signature forthe digital file received.
 11. The method of claim 10 further comprisingnot redundantly storing the digital file when a result of the comparisonreveals that the digital signature of the digital file corresponds to atleast one of the stored digital signatures.
 12. The method of claim 1wherein determining the digital signature includes applying a hashingtechnique to all or part of all of the digital file.
 13. The method ofclaim 12 wherein applying the hashing technique includes applying an MD5algorithm to the digital file.
 14. The method of claim 12 whereinapplying the hashing technique includes applying a version of an SHAalgorithm to the digital file.
 15. The method of claim 1 wherein thedigital signature is determined from less than all of the digital file.16. The method of claim 1 wherein the digital signature is determinedbased on a name of the digital file.
 17. The method of claim 1 whereindetermining the digital signature is determined based on a size of thedigital file.
 18. The method of claim 1 farther comprising verifyingthat the digital file received by the receiving system corresponds to astored digital file.
 19. The method of claim 18 wherein verifying thatthe digital file corresponds to the stored digital file includesverifying that at least a portion of a name of the digital filecorresponds to at least a portion of a name of the stored digital file.20. The method of claim 18 wherein verifying that the digital filecorresponds to the stored digital file includes verifying based on asize of the digital file.
 21. The method of claim 18 wherein verifyingthat the digital file corresponds to the stored digital file includesverifying based on a hash performed on the digital file.
 22. The methodof claim 18 wherein verifying that the digital file corresponds to thestored digital file includes verifying based on data in the digitalfile.
 23. The method of claim 1 further comprising adding a counter setto an initial value when adding the digital signature to the storeddigital signatures.
 24. The method of claim 23 further comprisingincrementing the counter when the digital signature is determined tomatch one of the stored digital signatures.
 25. The method of claim 23further comprising decrementing the counter when a user deletes a usercopy of the digital file.
 26. The method of claim 23 further comprisingdeleting the digital file when the counter is decremented below a filedeletion threshold.
 27. The method of claim 23 further comprisingremoving the digital signature from the stored digital signatures ofstored digital files when the counter falls below a signature deletionthreshold.
 28. The method of claim 23 further comprising deleting thelocation identifier when the counter is decremented below a locationidentifier threshold.
 29. The method of claim 1 wherein determiningwhether to store the digital file includes determining whether thedigital file has been replaced with a location identifier a high volumethreshold number of times per stored instance.
 30. The method of claim29 further comprising getting the location identifier for apreviously-stored version of the digital file when the digital file hasnot been replaced a high volume threshold number of times per storedinstance.
 31. The method of claim 30 further comprising replacing thedigital file with the location identifier.
 32. The method of claim 29further comprising storing the digital file when the digital file hasbeen replaced a high volume threshold number of times per storedinstance.
 33. The method of claim 32 further comprising storing thelocation identifier for the stored digital file.
 34. An apparatus forreducing duplication of files in a network system, the apparatuscomprising: an interface structured and arranged to receive a digitalfile; at least one signature processor structured and arranged todetermine a digital signature of the digital file; a comparing devicestructured and arranged to compare the digital signature against storeddigital signatures of digital files accessible by the receiving system;and at least one decision processor that is structured and arranged todetermine whether to store the digital file based on a result of thecomparison performed by the comparing device.
 35. The apparatus of claim34 wherein the digital file includes an electronic mail message.
 36. Theapparatus of claim 34 wherein the decision processor is structured andarranged to store the digital file when the result of the comparisonreveals that the digital signature of the digital file does notcorrespond to any of the stored digital signatures.
 37. The apparatus ofclaim 36 wherein the decision processor is structured and arranged toadd the digital signature to the stored digital signatures when theresult of the comparison performed by the comparing device reveals thatthe digital signature of the digital file does not correspond to any ofthe stored digital signatures.
 38. The apparatus of claim 36 wherein thedecision processor is structured and arranged to create a locationidentifier for the digital file indicating a location of the digitalfile when the result of the comparison performed by the comparing devicereveal that the digital signature of the digital files does notcorrespond to any of the stored digital signatures.
 39. The apparatus ofclaim 38 wherein the decision processor is structured and arranged tostore the location identifier with the digital signature of the digitalfile after files with the digital signature has been received more thana storage threshold number of times.
 40. The apparatus of claim 34wherein the decision processor is structured and arranged to store thedigital file when the comparison reveals that the digital signature ofthe digital file is found in the stored digital signatures of thedigital files accessible to the receiving system.
 41. The apparatus ofclaim 40 wherein the decision processor is structured and arranged tonot redundantly store a location identifier for a digital filecorresponding to one of the digital signatures when the result of thecomparison reveals that the one stored digital file signature matchesthe digital signature for the digital file received.
 42. The apparatusof claim 34 wherein the signature processor is structured and arrangedto determine the digital signature by applying a hashing technique toall or part of all of the digital file.
 43. The apparatus of claim 42wherein the signature processor is structured and arranged to determinethe digital signature by applying an MD5 algorithm to the digital file.44. The method of claim 42 wherein the signature processor is structuredand arranged to determine the digital signature by applying a version anSHA algorithm to the digital file.
 45. The apparatus of claim 34 whereinthe signature processor is structured and arranged to determine thedigital signature from less than all of the digital file.
 46. Theapparatus of claim 34 wherein the signature processor is structured andarranged to determine the digital signature based on a name of thedigital file.
 47. The apparatus of claim 34 wherein the signatureprocessor is structured and arranged to determine the digital signaturebased on a size of the digital file.
 48. The apparatus of claim 34wherein the signature processor is structured and arranged to verifythat the digital file received corresponds to a stored digital file. 49.The apparatus of claim 34 wherein the decision processor is structuredand arranged to include adding a counter set to an initial value whenthe digital signature of the digital file is added to the stored digitalsignatures of digital files.
 50. The apparatus of claim 34 furthercomprising a user interface enabling a user to access the digital file.51. The apparatus of claim 34 wherein the user interface includes anelectronic mailbox.
 52. The apparatus of claim 51 wherein the electronicmailbox includes one or more location identifiers.
 53. The apparatus ofclaim 34 further comprising one or more SMTP relays.
 54. The apparatusof claim 34 further comprising a file separator structured and arrangedto separate the digital file into one or more constituent components.55. The apparatus of claim 54 wherein at least one of the constituentcomponents includes header information.
 56. The apparatus of claim 54wherein at least one of the constituent components is content of aelectronic mail message.
 57. The apparatus of claim 54 wherein at leastone of the constituent components is an attachment.
 58. The apparatus ofclaim 54 wherein the device is structured and arranged to create a linkbetween more than one constituent component of the digital file.
 59. Theapparatus of claim 58 wherein the link includes a location identifier.60. The apparatus of claim 34 wherein the signature processor isstructured and arranged to determine the digital signature of thedigital file received by the receiving system on a device physicallydistinct from the interface structured and arranged to receive a digitalfile.
 61. The apparatus of claim 60 wherein the signature processorforwards one or more digital signatures to a data store of digitalsignatures for digital files accessible to the receiving system.
 62. Theapparatus of claim 34 wherein a local receiving system in a group of twoor more receiving systems maintains a local data store of digitalsignatures corresponding to digital files received by the localreceiving system.
 63. A computer program for reducing duplication offiles in a network system, comprising one or more sending nodes, and oneor more receiving systems, stored on a computer readable medium,comprising: a signature processing code segment that is operable to makea computer processor determine a digital signature for a digital filereceived by a receiving system; a comparing code segment that isoperable to make a computer processor compare the digital signatureagainst a stored digital signatures of digital files accessible by thereceiving system; and a decision code segment that is operable to make acomputer processor determine whether to store the digital file based ona result of the comparison performed by the comparing code segment. 64.The computer program of claim 63 wherein the decision code segment isstructured and arranged to add the digital signature to the storeddigital signatures when the result of the comparison performed by thecomparing code segment reveals that the digital signature of the digitalfile does not correspond to any of the stored digital signatures. 65.The computer program of claim 63 wherein the decision code segment isstructured and arranged to create a location identifier for the digitalfile indicating a location of the digital file when the result of thecomparison performed by the comparing code segment reveals that thedigital signature of the digital files does not correspond to any of thestored digital signatures.
 66. The computer program of claim 65 whereinthe decision code segment is structured and arranged to store thelocation identifier with the digital signature of the digital file afterfiles with the digital signature has been received more than a storagethreshold number of times.
 67. The computer program of claim 63 whereinthe decision code segment is structured and arranged to store thedigital file when the comparison reveals that the digital signature ofthe digital file is found in the stored digital signatures of thedigital files accessible to the receiving system.
 68. The computerprogram of claim 67 wherein the decision code segment is structured andarranged to store a location identifier for a digital file correspondingto one of the digital signatures when the result of the comparisonreveals that the one stored digital file signature matches the digitalsignature for the digital file received.